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ABSTRACT 



High redshift measurements of the baryonic acoustic oscillation scale (BAO) from 
large Lya forest surveys represent the next frontier of dark energy studies. As 
part of this effort, efficient simulations of the BAO signature from the Lya forest 
will be required. We construct a model for producing fast, large volume simu- 
lations of the Lya forest for this purpose. Utilising a calibrated semi-analytic 
approach, we are able to run very large simulations in 1 Gpc 3 volumes which 
fully resolve the Jeans scale in less than a day on a desktop PC using a GPU 
enabled version of our code. The Lya forest spectra extracted from our semi- 
analytical simulations are in excellent agreement with those obtained from a fully 
hydrodynamical reference simulation. Furthermore, we find our simulated data 
are in broad agreement with observational measurements of the flux probabil- 
ity distribution and ID flux power spectrum. We are able to correctly recover 
the input BAO scale from the 3D Lya flux power spectrum measured from our 
simulated data, and estimate that a BOSS-like 10 4 deg 2 survey with ~ 15 back- 
ground sources per square degree and a signal-to-noise of ~ 5 per pixel should 
achieve a measurement of the BAO scale to within ~1.4 per cent. We also use our 
simulations to provide simple power-law expressions for estimating the fractional 
error on the BAO scale on varying the signal-to-noise and the number density 
of background sources. The speed and flexibility of our approach is well suited 
for exploring parameter space and the impact of observational and astrophysical 
systematics on the recovery of the BAO signature from forthcoming large scale 
spectroscopic surveys. 

Key words: intcrgalactic medium - quasars: absorption lines - large-scale struc- 
ture of Universe: Cosmology theory. 



1 INTRODUCTION 

The Lya forest is the series of absorption lines arising 
from the resonant scattering of redshifted Lya photons 
emitted from a background source by the intervening neu- 
tral hydrogen in the intergalactic medium ( Rauch|1998 1. 
The neutral hydrogen is in photo-ionisation equilibrium 
with the ionising background produced by the integrated 
emission from galaxies and quasars (e.g. |Meiksin||2009"l ). 
Numerical simulations and semi-analytical models have 
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demonstrated that the Lya forest is a valuable probe of 
the underlying matter density field in the intergalactic 
medium (IGM), tracing the so-called "cosmic web" of 



large scale structure (|Cen et al.||1994| 


Hernquist et al.| 


1996 Bi & Davidsen|1997 Theuns et al. 


19981. Themea- 



surement of fluctuations in the transmitted Lya flux in 
samples of quasar spec tra (|McDonald et al.||2000| | Croft | 



et al. 2002 Kim et al. 20041, combined with a suitable 



model to connect the observed flux to the underlying 
matter distribution, thus enable the matter power spec- 
trum (PS) to be inferred on small scales along the line-of- 
sight ( |Croft et al.|2002] |Viel et al.|2004||McDonafd et al 
[2uu5] ). 

In recent years, it has been proposed that the Lya 
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forest of absorption lines may also be used to detect the 
signature of baryonic acoustic oscillations (BAO) in the 



large scale structure of the IGM (e.g. McDonald & Eisen- 



stein 



20071. BAOs provide a standard ruler which may 
be used to examine the geometry of the Universe and the 
nature of dark energy, and are already observed in the 
Cosmic Microwave Background (CMB) at z ~ 1100 and 



the clustering of galaxies at low redshift ( Eisenstein et al 



2005l|Cole et al.|2005|[Hutsi|2006||Blake et al.|2007 



Pad- 



manabhan et al.|2007|lPercival et al.|2007||2010||Beutler 



et al.||2011| |Blake et al.||2011[ ). The characteristic BAO 
signal is spatially correlated on scales of order ~ 150 co- 
moving Mpc; large survey volumes are therefore required 
to provide adequate statistics for the detection of this 
scale. Existing low redshift studies are subject to a de- 
generacy between the space-time curvature Qk and an 
evolving dark energy equation of state ( |Zhan et al.|2005| . 
Studying BAO at higher redshift can help alleviate this 
difficulty. However, the extension of galaxy surveys to 
higher redshifts becomes increasingly expensive because 
of the significant telescope resources required to observe 
a sufficient number of (fainter) galaxies. 

An alternate measurement of large scale structure at 
z ~ 2 — 3 is provided by the Lya forest. There are several 
advantages to using the Lya forest as a probe of BAO. 
At higher redshifts the evolution of Qk and dark energy 
differ (ftk evolves as oc (1 + z) 2 , whereas dark energy 
is expected to evolve more slowly with redshift) allowing 
one to break this degeneracy and to obtain constraints on 
the dark energy equation of state at this epoch. Further- 
more, the detection of suitable background sources be- 
comes significantly easier due to the peak in quasar num- 
ber density observed at z ~ 2.2 (Richards et al. 20061. 



McDonald & Eisenstein ( 2007 1 demonstrated that low to 



medium signal-to-noise (S/N) spectra of a large number 
of quasars with sufficient density per square degree could 
be used to detect the BAO signal with the Lya forest. 
As an example, these authors estimated that the BAO 
scale could be measured in a 2000 square degree survey 
with 40 quasars per square degree. The BAO signal may 
then be statistically extracted from such a sample via 
either the correlation function (measured as a character- 
istic peak) or the power spectrum (measured as a char- 



acteristic oscillation period). More recently McQuinn & 



White (2011 I demonstrated the sensitivity of future Lya 
forest surveys to the flux correlation function, and inves- 
tigated the optimal survey configuration for estimating 
the Lya forest correlations including the use of galax- 
ies as background sources to boost the Lya forest survey 
sensitivity. These authors also estimate the sensitivity of 
a future BAO measurement to the systematics associated 
with Lya forest surveys. 

In anticipation of the necessary large volume Lya 
forest surveys such as the Baryon Oscillation Spectro- 
scopic Survey (BOSS; Schlegel et al.|[2009| |Slosar et al. 
2011 L simulation work has also been carried out to con- 



struct synthetic Lya forest spectra for use in construct- 
ing mock surveys. Recently, large volume, high resolution 
dark matter (DM) N-body simulations have been per- 
formed by iWhite et al.l (120101) and ISlosar et all (|2009|), 



while Norman et al. ( 2009 ) have presented fully hydrody- 



purpose. In this work we outline a method for producing 
fast (i.e. less than one day), similarly large volume, high 
resolution simulations using a single desktop PC with 
a graphics processing unit (GPU). Our approach, which 
is based on widely used semi-analytical models for the 



Lya forest 


Bi 


1993 


Reisenegger & Miralda-Escude 


1995 


Gnedin & ] 


Ul 


1996 


Bi & Davidsen|1997||Hui et al. 


1997 


Gnedin & ] 


iui 


1998 


Choudhury et al.|2001||Matarrese & 


Mohayaee 


2002 


Viel et al.||2002a|b|), does not capture 



the mildly non-linear effects of the Lya forest modelled 
in the N-body and hydrodynamical simulations. Never- 
theless, this method is well suited for studying the statis- 
tics of Lya forest absorption on large scales, where the 
assumption of linear evolution is a reasonable approxima- 
tion. Our approach is therefore complimentary to more 
accurate but very expensive numerical simulations. We 
note another semi-analytical approach for studying the 
BAO in the Lya forest has also been recently presented 



by Kitaura et al. (20101 



This paper is organised as follows. In Section [2j we 
provide a summary of the semi-analytical model and the 
method used for generating our synthetic Lya forest spec- 
tra. In Section [3] we compare the semi-analytical density 
and velocity fields to a fully hydrodynamical simulation. 
In Section|4]we describe the simulations used in this work 
in more detail. In Section|5]we test our model by compari- 
son to a fully hydrodynamical simulation and a selection 
of observational data, and in Section [6] we extract the 
BAO signature from a mock Lya BAO survey and com- 
pare our approach to other recently published work. In 
Section [7] we provide scaling relations for the fractional 
error on the recovery of the BAO scale for Lya forest 
BAO surveys. Finally, in Section [8] we finish with our 
closing remarks. An appendix detailing the performance 
and implementation of our GPU enabled code is included 
at the end of the paper. 



2 SEMI-ANALYTICAL MODEL FOR THE 
IGM 

Many approximate techniques have been proposed for 
modelling the mildly non-linear, low column density IGM 
from an initial dark matter distribution. Approaches 



taken include the lognormal method ( Coles fc Jones|1991[ ) 
applied to the linear DM distribution to mimic non-linear 



behaviour (Bi 1993 Bi & Davidsen 1997 Choudhury 



et aLpOOl \ , the rank-ordered mapping of linear to non- 



linear densities using a calibration hydrodynamical simu- 
lation ( |Viel et al.|2002b[) or using the Zel'dovich approx- 



imation ( ZerDovich||1970"] ) to generate the DM distribu- 



tion. Many of these models also subsequently smooth the 
initial DM density field on a scale related to the Jeans 
length in the low density IGM, accounting for the ef- 
fect of gas pressure on the baryon distribution on small 



scales ((Reisenegger & Miralda-Escude 


1995 


Gnedin & 


Hui||1996 ( 1998; |Hui et al.||1997| |Matarrese & Mohayaee 


2002 


Viel et al.||2002a 


1. 



In this work we follow Viel et al. (2002b I, who in 



namical simulations in slightly smaller volumes for this 



vestigated various approaches for producing more accu- 
rate models for the gas density distribution using semi- 
analytical models. These authors found they could better 
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mimic the non-linear DM distribution by taking the DM 
density probability distribution function (PDF) from a 
linear simulation and performing a rank-ordered mapping 
to the corresponding distribution obtained from a hydro- 
dynamical simulation. We adopt a similar approach here, 
and use a hydrodynamical simulation to calibrate the one 
point distribution for the density field. This ensures we 
obtain accurate one and two point statistics for most of 
the absorption in the resulting Lya spectra. However, as 



noted by Viel et al. (2002b I the two point statistics for 
regions of strong absorption with F — e~ T < 0.1, where 
non-linear effects are important, will not be properly cap- 
tured by this model. In this paper, we demonstrate this 
limitation will not present a serious impediment to ex- 
tracting the BAO signal on large scales from our simu- 
lated data. We stress, however, that our semi-analytical 
simulations are based on linear theory and they do not in- 
clude large scale non-linear evolution and BAO damping 
as described in the detailed N-body work of |Seo &: Eisen-| 
stein (2007). Rather, our models can be used to address 
the non-gravitational issues associated with the Lya for- 
est, and will need to be coupled with N-body studies of 
the gravitational evolution of the BAO in order to make 
detailed comparisons with real data. 



We use model L3 of Bolton et al. (2010) as our cal- 



ibration hydrodynamical simulation, with a snapshot at 
z = 2.976 generated using the parallel Tree-SPH code 
GADGET-3 ( |Springel|2005| >. This simulation assumes a 
ACDM cosmology with, h = 0.72, O m = 0.26, D. A = 0.74, 
n b = 0.0444, n s = 0.96 and a 8 = 0.8. The calibration 
simulation has a box size of 40 h~ Mpc and contains 
2 x 512 3 gas and DM particles, yielding a gas particle 
mass resolution of 5.9 x 10 6 /i _1 AfQ. Importantly, this 
particular box size and mass resolution is sufficient for 
resolving the gas densities responsible for the Lya forest 
at z ~ 3 (Bolton fc Becker|[2509] ) . We note that in this 
work we assume a constant redshift of z — 3 for our sim- 
ulations. The accuracy to which the BAO scale can be 
recovered is redshift dependent and as such we would ex- 
pect our results to differ slightly across the redshift range 
of z = 2 - 3. 



2.1 Generation of the density field 

We begin by generating a linear DM density field in 
Fourier space within a cubic simulation volume accord- 



ing to the transfer function of Eisenstein & Hu ( 1998 1 



The density field is then linearly evolved using the linear 
growth factor D+(z) to the redshift of interest. Follow- 
ing Bi & Davidsen (19971, we account for the effect of 



gas pressure on the baryons by smoothing the linear DM 
density field with the kernel 



Sh(k, z) 



(k,z) 



l + k 2 /k 2 F (z)- 



(1) 



where kp is the filtering scale, which is related to the 
comoving Jeans scale kj via kp — ekj. The comoving 
Jeans scale is given by 



kj = Ho 



3/im p n m (l + z) 1 17 " 
2 7 fc B T(z) 



(2) 



where p is the reduced mass and T(z) is the gas temper- 
ature. The smoothing scale kp is a free parameter in the 
simulations, and effectively accounts for the finite delay 
between heating and the subsequent pressure response of 
the gas, which is dependent on the specific reionisation 
history ( |Gnedin fc Hui|1998||Desjacques fc Nusser|2005 1. 
In this work we choose kF = 6.5 Mpc - , which we find 
provides good agreement with spectra extracted from 
our calibration hydrodynamical simulation and the ob- 
servational data (see Section[5|. In Section[3]we compare 
the generated linear density field and the corresponding 
non-linear rank-ordered density field to the calibration 
hydrodynamical simulation using the three dimensional 
matter power spectrum. 



2.2 Generation of the velocity field 

In addition to the density field, the Lya forest is also 
sensitive to the peculiar velocity field. We generate the 
linear peculiar velocity field, based on our linear DM den- 
sity field, using the following expression, 



viGM(k, z) = E+(z)ik/k 2 SuM(k), 



(3) 



where viom is expressed relative to the comoving 
wavenumber k, E+(z) = H(z)f(Q nl ,Q\)D+(z)/(l + z) 
and /(fi m , Qa) = — dlnD+(z)/dln(l + z). We compare the 
linear peculiar velocities generated from Equation [3] with 
the peculiar velocities in the calibration hydrodynamical 
simulation in more detail Section [3] 



2.3 Generation of Lya forest spectra 

The calibrated baryonic density and peculiar velocity 
fields are next used to generate the synthetic Lya for- 
est spectra. We use the approach outlined in |Hui et al.| 
( 1997 1, assuming that the neutral hydrogen in the ionised 



IGM is in photo-ionisation equilibrium; this should be a 
reasonable approximation at z ~ 3. The proper number 
density of neutral hydrogen in the IGM is 



nm(x,z) 



7.24 x 10- 6 n H ( Z ) 



-°- 7 'n b A a 



i 



[1 + S h (x,z)] 



1 



0.024 

3 



(4) 



where bh is the proper mean density of neutral hydrogen, 
F_i2 is the hydrogen photo-ionisation rate in units of 
10 -12 s — 1 and 5b is the baryonic overdensity and is a 
function of the comoving position x. We take the case- 
A recombination rate of neutral hydrogen to be 4.17 X 



1 |Viel et al.| | |2002b| account for gas pressure by using a third 
order polynomial fit to the relationship between the baryon 
and DM density in their calibration hydrodynamical simula- 
tion, rather than applying a global smoothing to the density 
field in Fourier space as we do. However, we found that apply- 
ing a global smoothing scale to the linear DM density field, 
and then mapping from the DM density PDF directly to the 
baryon PDF of the hydrodynamical simulation, produced bet- 
ter agreement for both individual lines of sight and for Lya 
flux statistics. 
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10~ 13 ( 10 "4jy ) cm 3 s _1 . The temperature of the low 
density IGM, S + 1 < 10, is then modelled assuming a 
power-law temperature-density relation 
\7-i 



T = T (l + 8 h 



(5) 



where 7 is the polytropic index describing the slope of the 
temperature-density relation. This relation is expected 
to arise following reionisation due to the interplay be- 
tween photo-heating and adiabatic cooling, where typ- 
ically 1 < 7 < 1.6 (Hui et al. 1997 Valageas et al 



2002| ). However, at densities 3 + 1 > 10, radiative cooling 
becomes more efficient and the power-law temperature 
density relation is no longer a good approximation. We 
therefore employ a pivot point at 6 + 1 = 10, below which 
the typical temperature-density relation still holds, but 
above which we set the temperature to be constant, such 
that T(S + 1 > 10) = T(<5 + 1 = 10). Note, however, 
shock heating will introduce some scatter into this rela- 
tion. Furthermore, Hell reionisation, which is expected 
to end around z ~ 3, may also produce a relationship 
between temperature and density which is more com- 



plicated than this tight power-law (Bolton et al. 2009 
McQuinn et al.| [2009). There is also some observational 



evidence for 7 < 1 (i.e. an inverted temperature-density 



relation, Viel et al. 20091, although it appears difficult 
to achieve this via heating during Hell reionisation alone 
McQuinn et al.||2009[ ). However, 



( Bolton et al.|2009 



defer the discussion of such possible systematic uncer- 
tainties to a future study. 

We next generate the transmitted Lya flux along the 
line-of-sight for our synthetic Lya forest spectra using the 
relation F — e~ T , where r is the Lya optical depth. The 
optical depth of the synthetic Lya spectra is computed 
using (e.g. |Theuns et al.|1998| ) 



,(i) = 



*8R 



rl/2 



E 



nm 
bmU) 



exp 



vu(i) - u(j) 
bm(j) 



(6) 



where i and j denote pixels along the line of sight through 
the simulation volume, 5R is the pixel width in proper 
coordinates, a a — 4.48 x 10 -18 cm 2 is the scattering 

T \ 1 / 2 

cross-section for Lya photons, bm = ^Sf is the 
Doppler parameter describing the thermal width of the 
line profiles, vh is the Hubble velocity and u(j) is the 
total velocity given by the summation of the Hubble 
flow and the peculiar velocity along the line of sight, 

u(J) = «hC?) + V p ec(J)- 

Finally, once we have our optical depth along the 
line of sight, we renormalise the optical depths of all 
of our spectra to match the observed mean flux of the 
Lya forest at z ~ 3. We define the transmitted flux as 
(F) = (e _Ar ), where A is a normalisation constant to 
be solved for. This modification is equivalent to rescaling 
the HI photo-ionisation rate produced by the UV back- 
ground. We solve for A by summation over all generated 
spectra and iterate A until the mean transmitted flux 
from the simulated spectra matches the observationally 
measured value. We match our mean transmitted flux to 
that observed by Kim et al. (20071, corresponding to a 



mean transmitted flux of (F) = 0.72 or an effective opti- 
cal depth of r e ff = 0.329 at z = 3. Lastly, we note that 
in the generation of our Lya spectra, we do not include 




k (Mpc 1 ) 



Figure 1. The dimcnsionless dark matter PS of our 40 h~ 1 
Mpc, 512 3 semi-analytical simulation before (dot-dashed) and 
after (dotted) the rank-ordered mapping of the linear den- 
sity field with our calibration hydrodynamical simulation. For 
comparison we also show the theoretical linear input PS (solid) 
and the dark matter PS of the calibration hydrodynamical 
simulation (dashed). 



the redshift evolution of the effective optical depth along 
individual lines-of-sight. The effect of metal absorption 
lines and the damping wings originating from high col- 
umn density absorption systems are also excluded. 



3 SEMI-ANALYTIC MODEL 
PERFORMANCE 

Before comparing the Lya forest simulations used in this 
work with observations, we first verify the performance 
of our rank-ordered semi-analytic model. Firstly we in- 
vestigate the effect that rank-ordered mapping of the lin- 
ear density field has on the matter power spectrum, and 
secondly, we check that our linear peculiar velocity field 
produces a reasonable description of the peculiar velocity 
field when compared to hydrodynamical simulations. 

3.1 Matter power spectrum 

In Figure [T] we illustrate the effect of the rank-ordering 
procedure on the linear dark matter density field by 
comparing the dimensionless dark matter PS from our 
semi-analytic simulations to the dark matter PS from 
the calibration hydrodynamical simulation. We find that 
the rank-ordered linear matter power spectrum correctly 
recovers the large scale behaviour when compared to 
the non-linear dark matter PS from the hydrodynami- 
cal simulations. For smaller spatial scales, the non-linear 
density from the rank-ordered method underpredicts the 
correct non-linear behaviour. However underproducing 
the small-scale power will have no significant effect on 
the BAO scale. Thus, provided we are able to maintain 
roughly the same spatial resolution in our large-scale sim- 
ulations as in our calibration simulation, the rank-ordered 
mapping procedure will perform well at reproducing the 
correct large-scale behaviour required for accurately sim- 
ulating the recovery of the BAO scale. 
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Figure 2. The linear peculiar velocity PDF's of our 40 h~ 1 Mpc, 512 3 semi-analytical simulation. From left to right are the 
peculiar velocity PDF's in the x, y and z directions of our simulation (dashed) and the calibration hydrodynamical simulation 
(solid). The linear peculiar velocity field matches reasonably well in the low density regions which have smaller peculiar velocities. 
However, the semi-analytical model does not produce the larger (rarer) peculiar velocities expected in non-linear, overdense regions. 



3.2 Peculiar velocities 

As discussed in the previous section, we also produce the 
linear peculiar velocity field prior to the rank-ordered 
mapping of the density field. In Figure [2] we compare the 
resulting linear peculiar velocity PDFs to those from the 
hydrodynamical simulation. We find the linear peculiar 
velocity field to be isotropic, and that the velocities from 
our simulations match quite well in the low density re- 
gions. However as expected, higher density regions with 
larger (but rarer) peculiar velocities (which correspond 
to regions of infall in the hydrodynamical simulation) are 
not correctly captured in this model. 



4 Lya FOREST SIMULATIONS 

We construct two different Lya forest simulations in this 
work; a high resolution simulation for comparison to our 
reference hydrodynamical simulation, and a low resolu- 
tion, large volume simulation for recovery of the BAO 
signature. Both models are compared to published mea- 
surements of the Lya flux probability distribution func- 
tion (PDF) and the ID flux power spectrum (PS). In both 
simulations we assume a temperature-density relation 
with 7 = 1.3 and we set the temperature To = 1.7x 10 4 K, 
broadly consistent with the observational constraints on 
the IGM thermal state at z = 3 ( Schaye et al.|2000 Lidz 
et al.|2010| |Becker et aL|2011[ ). 



4.1 High resolution simulations 

In our high resolution model, we simulate a 40 h~ 
Mpc simulation box, containing 512 3 pixels. The box 
size and resolution are chosen to mimic our calibration 
GADGET-3 hydrodynamical simulation, although we 
note that the resolution comparison will not be exact 
due to the spatially adaptive resolution of GADGET-3. 
In order to compare our simulated Lya forest spectra to 
observed high resolution data, we must also process our 



simulated spectra to mimic the properties of the data. 
We convolve the spectra with a Gaussian with a FWHM 
= 7 km s _1 , and resample our spectra onto 0.05 A bins. 
We finally add Gaussian distributed noise assuming S/N 
~ 50 per pixel. The same procedure is also performed 
on the Lya spectra generated from the hydrodynamical 
simulation. 



4.2 Low resolution, large volume simulations 

Recovery of the BAO signal from our simulation requires 
that we also simulate two large volume simulations at 
lower resolution. One simulation is generated using a 
matter PS containing baryon oscillations, and the other 
has the baryon oscillations suppressed. We use the trans- 



fer functions of Eisenstein & Hu ( 1998 1 for this purpose. 
Each simulation is generated in a 1 Gpc 3 comoving box, 
containing 4096 3 pixels (chosen to be comparable with 
the 4000 pixels per spectra computed using the roadrun- 



ner supercomputer by White et al. 20101. Each pixel is 
therefore ~ 244 comoving kpc. In comparison, the comov- 
ing Jeans smoothing scale (given by equation 2) at z = 3 
is ~ 760 kpc (and our filtering scale &f = 6.5 Mpc -1 cor- 
responds to a comoving scale of ~ 966 kpc). Importantly, 
this implies our large volume simulations adequately re- 
solve the Jeans scale at mean density. 

In order to mimic the low resolution Lya forest 
data expected in forthcoming BAO surveys we also con- 
volve our spectra with a Gaussian with a FWHM of 
~ 3.63 A ( ~ 224kms _1 ) and resample the spectra onto 
1.0375 A bins. These values are representative of BOSS 



spectra (Eisenstein et al. 20111. We also add Gaussian 



distributed noise, S/N = 5 per pixel. Finally we ensure 
each synthetic line of sight corresponds to the pathlength 
between the quasar rest frame Lya and Ly/3 transitions 
only, minus the 3000 km s^ 1 blueward of Lya in the 
quasar rest frame. The latter accounts for the quasar 
proximity effect in the observational data. An example 
Lya forest sightline drawn from our simulation is dis- 
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Figure 3. Example synthetic Lya forest spectrum generated from our 1 Gpc 3 simulation with 4096 3 pixels. Upper panel: The 
transmitted flux along the line-of-sight predicted by the simulation. Middle panel: The same spectrum after convolution with a 
Gaussian instrument profile with FWHM= 224 km s _1 . Lower panel: The fully processed spectrum, resamplcd onto pixels of width 
~ 1 A with Gaussian distributed noise corresponding to a signal-to-noise of 5 per pixel. This spectrum is representative of the 
mock data set we use to recover the BAO signal from the Lya forest. Note that the redshift evolution of the effective optical depth 
along the line-of-sight is not included in this model. 



played in Figure [3] It is these spectra which will be used 
in the BAO recovery described in Section [5] Before this, 
however, we now proceed to perform consistency checks 
on our simulation output by comparing the synthetic Lya 
forest spectra to measurements of Lya flux statistics. 



5 FLUX STATISTICS 
5.1 Available data 

We shall compare our simulations to the measured flux 



PDF (|McDonald et aI.|2000l|Kim~et al.|2007||Desjacques 



et al 



et al. 



2007[) and the ID line -of-si ght flux PS (|McDonald 
2000||Croft et al.|2002| >. The |Kim et"aT| < |2007^ sain" 



pie contains 18 high resolution quasar spectra obtained 
with the VLT/Ultraviolet and Visual Echelle Spectro- 
graph (UVES), specifically chosen to have a signal-to- 
noise of at least 30 — 50 and to fully sample the Lya 



forest region. In comparison, the McDonald et al. ( 2000 1 



sample contains 8 high resolution quasar spectra obtained 
using the High Resolution Echelle Spectrometer (HIRES) 



at the Keck telescope. However, the |Croft et al.| ( |2002[ > 
sample contains both high resolution spectra from Keck 
HIRES and 23 low resolution spectra obtained with the 
Low Resolution Imaging Spectrometer (LRIS). The |Des^| 
jacques et al. ( 2007[ ) sample contains 3492 low resolution 
quasar spectra from the Sloan Digital Sky Survey (SDSS) 
data release three (DR3). 



5.2 The flux probability distribution function 

We first compare the flux PDF constructed from our syn- 
thetic Lycv forest spectra to measurements at z ~ 3 ob- 
tained from high resolution data in Figure g] The data 
points correspond to the measurements presented by |Kim| 
et al.| (|2007|) and |McDonald et al.| (|2000|). Note that 



Kim et al. (2007) and McDonald et al. (20001 use differ- 



ent prescriptions for the removal of the metal lines. The 



flux PDF measurement performed by Kim et al. (20071 



removes suspected metal lines by Voigt profile fitting, 



whereas the McDonald et al. (2000) sample instead ex- 
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Figure 4. The flux PDF generated from our 40 h- 1 Mpc, 512 3 
semi-analytical simulation (solid curve) and our calibration 
hydrodynamical simulation (dashed curve), compared to ob- 
servational measurements made using high resolution quasar 
spectr a by|Kim et al.| | |2007| (closed circles) and |McDonald| 
|et al.| ( |2000| ) (open circles) at z ~ 3. The simulated data has 
been processed to resemble the resolution and S/N of the ob- 
servational data. For comparison, the flux PDF from our large 
volume (1 Gpc, 4096 3 ) semi-analytical simulation is also dis- 
played (dot-dashed curve). The spectra have not been pro- 
cessed to resemble observational data in this latter instance. 
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Desjacques et 
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Figure 5. The Lya forest flux PDF measured from the low 
resolution SDSS DR3 data at z = 3 l |Desjacques et al.|[2007| l 
is shown by the dashed curve with data points. The solid 
curve compares the flux PDF generated from our 1 Gpc semi- 
analytical simulation after the spectra have been modified to 
match the resolution and noise properties of the data. Note 
that the level of disagreement between the simulations and ob- 
servational data is similar to that found by Desjacques et al. 
( |2007| l. These authors noted that altering the continuum level 
on the synthetic spectra can significantly improve agreement 
with the data. 



cises regions which are suspected of being contaminated 
by metal lines. 

Following the observational measurements, we com- 
pute the flux PDF from our simulations by separating 
the transmitted flux into 21 equally spaced flux bins of 
width AF = 0.05, from F = to F = 1 (i.e. at F = 0, the 
first data point contains flux from —0.025 < F < 0.025). 
The solid and dashed curves in Figure [4] correspond to 
the flux PDF computed from the synthetic Lya spectra 
extracted from our high resolution semi-analytical sim- 
ulation and the full hydrodynamical simulation, respec- 
tively. These spectra have been processed to resemble the 
observational data, as described in Section [4. 1| The dot- 
dashed curve instead shows the flux PDF computed from 
the spectra extracted from the 1 Gpc 3 simulation box be- 
fore the data are processed to resemble the low resolution 
data. 

The PDF generated by our high resolution semi- 
analytical simulation matches remarkably well with the 
hydrodynamical simulation, with only a small difference 
observed in the PDF at F — 0.8-1. Furthermore, the 
simulations are also in broad agreement with the |Mc-| 
Donald et al. ( 2000 1 data, although they do not agree 



so well with the more recent observations of Ki m et al.l 
. On the other hand it has been shown by |Bolton| 
( 2008 1 that a better match to the observational 



(2007 



et al. 



data of Kim et al. (20071 may be achieved when an in- 
verted temperature-density relation (7 < 1) is assumed; 
we have instead adopted 7 = 1.3 in this work. The lower 
resolution, large volume simulation also matches the high 
resolution simulations reasonably well. 

As an additional consistency check, in Figure [5] we 
also compare the flux PDF constructed from our large, 
low resolution simulation to the flux PDF measured from 
low resolution data by Desjacques et al. (20071. In or- 



der to compare our simulated spectra to the Desjacques 
|et al. I ( |2007[ ) flux PDF, we process our simulated spectra 
to mimic the SDSS DR3 data by convolving the flux by 
a Gaussian with FWHM of 170 km s _1 , resampled onto 
~ lA bins and adding Gaussian distributed noise with 
S/N = 3.8 per pixel. In contrast to the high resolution 
PDF, the agreement between our simulated low resolu- 
tion spectra and the observed flux PDF from the low res- 
olution data is rather poor. However, [Desjacque s et al.] 
(20071 found a very similar disagreement between their 
simulations and the flux PDF. These authors attributed 
this difference to the single power law approximation used 
for the quasar continuum level in the observational data. 



Desjacques et al. (20071 found they could improve the 



agreement between their observations and simulation by 
introducing a break in the continuum slope, with a de- 
crease in the mean quasar continuum (~ 10 — 15 per cent) 
and introducing residual scatter (~ 20 per cent) into the 
continuum level. 

Finally, we note that the flux PDF is sensitive to 
the free parameters which are inputs to our simulation 
model. Changes to either the smoothing scale, &f, or the 
slope of the temperature-density relation, 7, in particular 
will alter the shape of the PDF, although the flux PDF is 
relatively insensitive to the assumed temperature at mean 
density To for fixed T e s ( Bolton et al.|2008 l. For example, 
we could match the data of Kim et al. ( 2007 1 more closely 



if we allowed the free parameters in our model such as 7 
or A;_f to vary. 



5.3 The ID flux power spectrum 

A more stringent test of the simulated data is the compar- 
ison of the synthetic spectra to higher order flux statis- 
tics such as the line-of-sight flux PS. Figure [5] displays 



© 0000 RAS, MNRAS 000, 000-000 



8 B. Greig et al. 



the measurements of the ID flux PS made by [McDonald 
|et al.| p000| ) and |Croft et al.| (|2002f at z = 3. We gen- 
erate the ID flux PS along the line-of-sight for both the 
high resolution semi-analytical simulation (solid curve) 
and the hydrodynamical simulations (dashed curve) for 
comparison to the data. The simulated results are again 
in excellent agreement, and for our choice of 7, To and 
k F (1.3, 1.7 x 10 4 K, 6.5 Mpc -1 ) the semi-analytical and 
hydrodynamical simulations both match well with the 



observations of Croft et al. (2002) 



We also compare the ID flux PS computed from the 
lower resolution, large volume simulation to the data. The 
dot-dashed curve in Figure [6] displays the ID PS com- 
puted from the unprocessed spectra, which agrees well 
with the observational data and smaller box simulations. 
Note, however, the PS in this case extends to much larger 
spatial scales. The ID flux PS for the Lya forest spec- 
tra that has been processed to resemble low resolution 
data is shown by the dotted curve in Figure [6] On larger 
scales this matches the power of the high resolution spec- 
tra from the 1 Gpc simulation well, demonstrating that 
lower resolution leaves the large spatial scales largely un- 
altered, as is required for measurements of the BAO scale. 
Note, however, the low resolution PS exhibits less power 
at smaller scales as expected, and also flattens out at 
k > 0.01 s km" 1 due to noise. 

As in the case of the flux PDF, the flux PS is sensitive 
to our choice of free parameters. In particular, the shape 
of the flux PDF is highly sensitive to the smoothing scale 
kF (for k > 0.01 s km -1 ). For a smaller kF, the power 
on small scales decreases as the underlying density field 
becomes smoother. The flux PS is also sensitive to both 
the slope of the temperature-density relation 7 and the 
temperature To, and the power decreases on small scales 
for both an increasing 7 and temperature To at z = 3 (see 
also Viel et al. 120041). However, in this work our main goal 



is to demonstrate that our simulations provide a model of 
the Lya forest which is adequate for extracting the BAO 
signature. The broad agreement with the observational 
data and previous modelling suggests this is indeed the 
case. 



6 EXTRACTING THE BAO SIGNAL FROM 
THE Lya FOREST 

The broad agreement between our semi-analytic simula- 
tions and the data, as well as the excellent agreement of 
our models with our reference hydrodynamical simula- 
tion, gives us confidence that our semi-analytical simu- 
lations provide a good representation of the Lya forest 
on large scales. We therefore now proceed to extract the 
BAO scale from our large scale simulations of the z ~ 3 
Lya forest. 

6.1 Mock data set 

We generate mock data sets using spectra sub-samples 
selected at random from a total sample of 100,000 lines 
of sight drawn parallel to the box boundaries of our 1 
Gpc simulation volume. The total simulation volume of 
1 Gpc 3 corresponds to a survey area of ~ 79 deg 2 at 
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Figure 6. The ID line-of-sight flux power spectrum gener- 
ated from our high resolution semi-analytical simulation model 
(solid curve) compared to the result from our reference hydro- 
dynamical simulation (dashed curve). Observational measure- 
ments by |McDonald et al.| ( |2000| | and |Croft et ah] ( |2002| l at 
z = 3 are shown by the open and closed circles, respectively. 
The ID flux PS from our large 1 Gpc simulation box before 
(dot-dashed curve) and after (dotted curve) the spectra are de- 
graded to resemble low resolution Lya forest spectra are also 
displayed. The degraded spectra resemble the line-of-sight dis- 
played in the lower panel of Figure [3] 



z = 3. We make the approximation that all background 
sources are at the redshift of the simulation (i.e. z = 3), 
the sight-lines are parallel, and that the spectra all have 
the same usable pathlength (i.e. the distance between the 
quasar rest frame Lya and Ly/3 transitions minus 3000 
km s _1 to account for the proximity effect). Note that 
because our method allows us to generate many simula- 
tions using different realisations for the density field at 
various redshifts quickly and efficiently, we may easily ex- 
tend the volume of a mock survey data set as required. 
Indeed much larger survey volumes will be required in 
practice to extract the BAO signature from observational 
data (e.g. |McDonald fc Eisenstein|2007 |. 



6.2 



Reconstructing the 3D PS from Lya 
spectra 



In practice, as there are only ever a limited number of 
skewers (quasar sight-lines) drawn through a survey vol- 
ume, we cannot directly measure the full 3D Lya flux PS 
from the data. In order to estimate the full 3D Lya flux 
PS from our simulations we must therefore reconstruct 
the true 3D Lya flux PS from the 3D Lya PS computed 
from the individual sight-lines, minus a weighted term 
which introduces aliasing-like noise to the analysis. We 



adopt the approach of McDonald & Eisenstein (20071 



and use the following expression for the reconstruction: 

P F ,truc(k) = P F ,box(fc) - PF,lD(fc||)P 2 I3,w(fe_ L ) - P N . (7) 

Here we measure the full 3D PS, PF,box(fc), of our sim- 
ulation volume using only the information given by the 
individual lines-of-sight in the mock survey. We then sub- 
tract off a '3D' PS generated by the multiplication of 
a 2D weighting PS, P2D,w(fcj_), indicating the positions 
of the Lya spectra and the ID Lya flux PS, iV,iD(k||), 
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measured from the synthetic Lya spectra along the line- 
of-sight. Finally, we also subtract off the noise PS, Pn, 
which is generated from the noise along the line-of-sight 
multiplied by the 2D weighting PS. After completion of 
this reconstruction process we then spherically average 
the resulting PS. We bin the spherically averaged 3D PS 
using concentric spheres of radii equal to multiples of the 
Nyquist frequency. Since the Fourier modes are discrete 
the position of each fc-bin is calculated by taking the av- 
erage of all k values which fall in each concentric sphere. 
This binning strategy is to ensure we do not introduce 
a shift in the BAO signal that can affect the recovered 
value of the BAO scale. 

We complete the reconstruction step defined by 
equation |7| on our mock data sets from two simulations; 
one using the matter PS including baryonic oscillations 
and another with the smoothed reference PS. By taking 
the ratio of these two reconstructed 3D Lya flux PS, we 
extract the resulting BAO signature. Note that in this 
work we have simply given each individual line-of-sight 
the same weighting in the reconstruction (i.e. one for each 
Lya spectrum and zero for no Lya spectrum). Realisti- 
cally, each weighting will vary according to the quality 



of the Lya forest spectra ( McDonald fc Eisenstein||2007 
McQuinn fc White|201l) >. 



6.3 Measurement uncertainties 

There are two contributions to the uncertainty on the re- 
constructed BAO signal we consider here; cosmic variance 
and shot noise. We shall estimate the former by using 
eight further 1 Gpc 3 simulations using different random 
seeds to generate the initial conditions. For the latter we 
shall adopt a Monte-Carlo error bar estimate. 



6.3.1 Cosmic variance 

In Figure [7] we estimate the fractional error on the power 
spectrum measurement due to cosmic variance. We use 
nine different 1 Gpc simulations to generate the 3D mat- 
ter (left panel) and Lya forest power spectra (right panel) 
for each realisation. The Lya power spectra from each 
simulation box is reconstructed using 20, 000 noiseless 
spectra at a density of ~ 250 deg -2 . The shot noise error 
due to under-sampling becomes negligible for this artifi- 
cially high background source density. Using these sim- 
ulations we then estimate the cosmic variance error by 
measuring the l-cr variations in the power spectra across 
the 9 different boxes, shown as the dashed curves in Fig- 
ure [7] These are compared to the theoretical expression, 
displayed as the solid curves in each panel, given by equa- 



tion (2) in Blake fc Glazebrook (20031, 



2 =2 (2tt) 3 1 
P) V 47rfc 2 Afc' 



(8) 



which gives the error on a power spectrum measurement 
averaged over spherical fc-bins of width Ak. For both the 
matter and Lya forest power spectra the theoretical ex- 
pression is in good agreement with our estimated cos- 
mic variance errors except at the largest scales, k < 0.015 
Mpc -1 . Hence we choose to use equation ^ for cosmic 
variance in the remainder of this work. 



6.3.2 Shot noise 

We adopt a Monte-Carlo approach for estimating the shot 
noise on our extracted power spectrum. We randomly 
sample our 1 Gpc 3 Lya simulation (containing 100,000 
lines-of-sight over ~ 79 deg 2 ) and generate subsamples 
of 1200 spectra, yielding ~ 15 spectra per square degree. 
For each randomly generated subsample, we complete the 
PS reconstruction process outlined in Section |6.2| We 
perform this procedure 100 times using spectra which 
have four different signal-to-noise ratios (S/N=2, 5, 10 
and 20) , and estimate the full covariance matrix and 1-<t 
error for each fc-bin. 



6.3.3 Error bar estimates 

In Figure [8] we compare the reconstructed BAO signature 
generated from our mock Lya forest data to the input 
BAO signature generated from the matter PS for different 
assumptions regarding the S/N ratio of the data. The 
spectra have been sampled at a density of 15 quasars per 
square degree (corresponding to the planned density of 
BOSS targets). We also compare our Monte-Carlo error 
bars to the expected cosmic variance error due to survey 
volume (shaded region) computed using equation Q. 

For S/N ratios of 2 and 5 per pixel the Monte-Carlo 
estimates of the errors are larger than the expected cos- 
mic variance error for a ~ 79 deg 2 survey area. As the 
S/N is increased to > 10 the recovery of the BAO sig- 
nature becomes limited by cosmic variance. We reiterate 
that the simulations used a volume of 1 Gpc 3 and that 
the error bars are proportional to the inverse square root 
of the survey area. We use a volume much smaller than a 
survey would require in order to illustrate the size of the 
error bars. 

In Figure [9] we instead compare the Monte-Carlo 
shot noise errors to the cosmic variance error estimates 
while varying the background source density and assum- 
ing a fixed S/N ratio of 5 per pixel. The line-of-sight 
densities are 15, 30, 45 and finally 60 quasars per square 
degree; background source densities larger than 45 per 
square degree are cosmic variance limited for our mock 
survey area of 79 deg 2 . 



6.4 Recovery of the BAO scale 

We now quantify the recovery of the BAO signature from 
our simulations by fitting the recovered 3D Lya flux PS 
with a function dependent on the characteristic BAO 
scale length. Following the approach of Blake & Glaze- 1 
brook ( 2003 1 , we assume a simple two parameter decay- 



ing sinusoidal model for the ratio of the 3D PS with bary- 
onic oscillations and the smooth reference PS. The func- 
tional form of the assumed fitting function is: 



P F {k) 



1.0 + Akexp 



/2iYk\ 



k 



0.1 /iMpc -1 



(9) 



V k A J ' 

where Pf,t is the smoothed reference PS with no baryon 
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Figure 7. Comparison of the measured cosmic variance from nine different 1 Gpc 3 volume simulations (dashed curves) to the 
theoretical expression given by equation (JsJ (solid curves). Left panel: The fractional error on the measurement for the matter 
power spectrum. Right panel: The fractional error on the Lyo flux power spectrum. 



oscillations, A is an arbitrary normalisation constant and 
Ua is the characteristic BAO scale in Fourier space (where 
the characteristic BAO scale sa = 27r/fc/i). To determine 
the two unknown parameters we perform a \ 2 minimisa- 
tion for the ratio of the PS obtained from our simulation 
to the function in equation |9|, such that: 



X{P) = 5Z5Z C,; J 1 [- P ratlo(&<) --Pratio,Ht(p,A!i)] 
*=1 j = l 

X [Pratio(fcj) - Pratio,flt(p, kj)], (10) 

where p contains the parameters for the fitting formula 
(A and &a), and Cij -1 is the inverse of the covari- 
ance matrix generated from our Monte-Carlo procedure. 
Here Pratio(fc) is the ratio of the two simulated PS (with 
and without the baryon oscillations) and P ra tio,flt(p, k) is 
given by equation (IM. We restrict the range of our \ 2 
minimisation to wavenumbers below k — 0.25 Mpc -1 , 
where the summation index denotes the summation over 
k space bins in our simulated PS. 

The best fit values for sa = 2iv/kA are summarised 
m Table [l] for various S/N ratios (including the idealised 
case of noiseless data) assuming a background source den- 
sity of either 15 or 45 deg -2 . The 1-a relative errors on sa 
are again estimated by using 100 Monte-Carlo subsam- 
ples of mock Lya? data. For each subsample we performed 
the x 2 minimisation and used the distribution of recov- 
ered values to estimate the 1-a relative error, Asa- The 
results are shown for a 1 Gpc 3 volume with an area of 
79 deg 2 , and are also shown after scaling to 2000 deg 2 (as- 
suming Asa is proportional to the inverse square root of 
the survey area) . We do not quote the best fit parameters 
and their associated errors for S/N= 2, as we could not 
recover the characteristic BAO scale at all for our small 
simulated volume. Even for S/N= 5, the recovery of the 
BAO signal is fairly poor. However, this is to be expected 
for the small survey area used here, and an increase in 
the survey area or S/N would improve the fractional er- 



ror on the recovered BAO scale considerably ( |McDonald| 
fc Eisenstein|200"7 I. We nevertheless find that we are able 
to recover the BAO scale from our 1 Gpc 3 simulations to 
within a few percent for S/N> 5. 

In Table [2] we again provide the best fit values for 
sa, but we instead perform the \ 2 minimisation using 
the Monte-Carlo shot noise errors added in quadrature 
with the cosmic variance errors from equation pj. We 
find that for 15 quasars per square degree, increasing the 
signal to noise continues to reduce the fractional error on 
the BAO scale. However for 45 quasars per square degree 
the fractional error saturates for S/N > 10, indicating the 
mock survey is cosmic variance limited for these param- 
eters. 



As a consistency check, we also obtain the recov- 
ered BAO signal from noiseless spectra. We performed 
the same 100 subsample estimation of the error bars both 
ignoring the estimated sample variance (Table [l) and in- 
cluding the sample variance (Tabled . We recover a BAO 
scale of 152.4 comoving Mpc (shot noise error only, 45 
deg -2 ), compared with the BAO scale of the input PS 
which was 152.5 comoving Mpc. This confirms that our 
semi-analytical simulations provide a reasonable descrip- 
tion of correlations in the density field on large scales. 



We note that the two parameter fitting formula used 
in equation (|9| does not provide a perfect description of 
our simulated data. There is a small reduction in the am- 
plitude of the power spectrum toward small scales which 
arises following the mapping and smoothing of the initial 
linear density field described in Section 2. This means 
the oscillations do not occur exactly around one mean 
value as expected in the fitting formula, and this affects 
the recovery of the arbitrary constant, A, in equation (J9p . 
Importantly, however, this does not affect the recovery of 
the BAO scale from our simulations. 
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Figure 8. The data points with 1-cr error bars display the BAO signature recovered from the 3D Lya flux PS generated from a 
mock Lya dataset containing 1200 (at ~ 15 per square degree) lines of sight for a 79 deg 2 survey (see text for details). Clockwise 
from the top left, the recovered BAO signature is extracted from spectra constructed with S/N ratios of 2, 5, 20 and 10. The error 
bars for varying S/N are generated using the Monte-Carlo approach described in Section |6.3.2| For comparison, the solid curve 
in each panel is the expected BAO signature generated from the ratio of the input linear dark matter power spectra with and 
without the baryon oscillation features. The shaded region, which is the same in each panel, displays an estimate of the cosmic 
variance error using equation <|SJ> for our mock survey volume of 1 Gpc 3 . 



Number Density 


S/N 


s A (2Tr/k A ) 


As A 


As a (2000 deg 2 ) 


(deg" 2 ) 




(comoving Mpc) 


(per cent) 


(per cent) 


15 


5 


150.7 


12.70 


2.53 




10 


157.4 


8.51 


1.70 




20 


154.2 


2.64 


0.52 




Noiseless 


153.1 


0.63 


0.13 


45 


5 


155.7 


5.73 


1.14 




10 


152.0 


2.41 


0.48 




20 


151.9 


1.23 


0.24 




Noiseless 


152.4 


0.30 


0.06 



Table 1. The recovered best fitting values for the BAO scale, sa, along with the fractional 1-cr errors obtained by applying the \ 2 
minimisation described by equation ( | 1 [ > to the simulated data displayed in Figure [8] Results are shown for a background source 
density of 15 and 45 deg -2 and for S/N ratios of 5, 10 and 20. The x 2 minimisation is performed using only the Monte-Carlo shot 
noise error estimates. The final row also provides the parameters recovered from noiseless data. For comparison, the input value 
is sa = 152.5 comoving Mpc. The final column contains the fractional error on the BAO scale after scaling our simulation result 
to a 2000 square degree survey area (McDonald & Eiscnstcin 2007). 
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Figure 9. Comparison of the Monte-Carlo generated shot noise errors (error bars) and the cosmic variance errors (shaded region) 
for varying background source density and a fixed S/N of 5 per pixel. The Monte-Carlo shot noise errors are generated following 
the outline in Section |6.3.2| Clockwise from the top left, the recovered BAO signature is extracted from sight-line densities of 15, 
30, 60 and 45 deg -2 . 



Number Density 


S/N 


s A (2n/k A Mpc) 


As A 


As A (10 4 deg 2 ) 


(deg- 2 ) 




(comoving Mpc) 


(per cent) 


(per cent) 


15 


5 


150.7 


15.51 


1.38 




10 


152.3 


14.59 


1.30 




20 


152.0 


6.28 


0.56 




Noiseless 


153.5 


3.75 


0.33 


45 


5 


155.6 


9.12 


0.81 




10 


152.6 


5.66 


0.50 




20 


151.3 


4.46 


0.40 




Noiseless 


151.1 


4.24 


0.38 



Table 2. As for as Table 1, except now showing the recovered best fitting parameters for the BAO scale, s A , using both the 
Monte-Carlo shot noise errors and the cosmic variance summed in quadrature. The final column contains the fractional error on 
the BAO scale after scaling our simulation result to a 10 4 square degree survey area. 



6.5 Comparison to other work 

Several other authors have recently described simulations 
designed to recover the BAO signature from the Lya for- 
est. It is therefore constructive to compare the results 
presented in this work to the approaches taken by other 



studies using our results from Table [T] Firstly, : McDonald] 



& Eisenstein ( 2007 \ used analytical arguments to predict 



that one should be able measure the radial and transverse 
distance scales used for BAO measurements to within a 
fractional error of ~1.4 per cent for S/N = 1.8 per pixel, 
~40 quasars per square degree and a survey area of 2000 
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square degrees. For a S/N of 5 per pixel over ~79 square 
degrees for ~45 quasars per square degree, we obtain a 
fractional error on the BAO scale of ~5.7 per cent. Scal- 
ing our survey area to 2000 square degrees (assuming the 
error scales as the inverse square root of the survey area) 
we would expect a fractional error of ~1.1 per cent (final 
column Table [Tj), co nsistent with the results of McDonald 
|fc Eisenstein] |2007l ). 

Large N-body simulations were used by |Slosar et al.| 



( 2009 \ to measure the correlation function and also the 
cross-PS, rather than the 3D PS. Each dark matter N- 

contained 



body simulation used by Slosar et al 



([20091 

3000 a particles in a box of size 1500 /i -1 Mpc, which does 
not fully resolve the Jeans scale. However, the authors ar- 
gue this should not affect the power on the acoustic scale. 
They conclude that the correlation function provides a 
better method for recovery of the BAO signal compared 
to both the 3D flux PS and the cross-PS, which is con- 



sistent with the approach taken by White et al. (20101 



Slosar et aL] ( |2009[ ) fit for the BAO peak at * = 2.5 using 
~56 quasars per square degree over a total area of ~ 400 
square degrees. For noiseless spectra we find a fractional 
error on the BAO scale of 0.30 per cent compared to 0.53 
per cent from Slosar et al. (20091. The survey area of 



Slosar et al. ( 2009 1 is ~5 times larger than ours, imply- 
ing an error that should be ~2.2 times smaller than our 
error. The origin of this difference is unclear, but may be 
in part due to the use of the diagonal covariance by |Slosar] 



et al. ( 2009 I, as opposed to the full covariance matrix we 
use in our best-fit parameter estimation. 

Finally, we compare our results to those of McQuinn 



& White (20111 who provide sensitivity estimates for 
large Lya forest surveys. Although a direct comparison 
here is less straightforward, using their table 4, a back- 
ground source density of ~ 15 deg -2 corresponds to an 
n e// = 1.4 x 10 -3 Mpc -2 at z = 2.5. However, at z = 2.5 
for S/N = 2, the effective number density varies from 
the true number density by roughly 0.59 (their tab le 3) 
and so n e // ~ 0.8 x 10 -3 Mpc -2 . From figure 8 in Mc- 
|Quinn fc White| ( |2011[ ) , this gives the fractional precision 
in the angular diameter distance and Hubble expansion of 
~ 2 — 3 per cent for a ~ 10 4 square degree survey volume 
at z — 2.5. Scaling our z — 3 results for the fractional 
precision of the BAO scale assuming S/N of 2 (Section 
[7]), a survey area of 10 4 deg 2 and 15 quasars per square 
degree, we find a fractional precision of ~ 2.5 per cent 



on the BAO scale. McQuinn & White (20111 also find 



that the amount of information obtained from a quasar 
is maximised for S/N ~5-10, consistent with our findings 
111 Figure [8] 



7 APPROXIMATE SCALING RELATIONS 
FOR THE FRACTIONAL ERROR 

To summarise our results we provide approximate scal- 
ing relations for the expected recovered fractional error 
on the BAO scale as a function of both survey S/N and 
quasar number density. We approximate this using a sim- 
ple power law expression, 



Error 


Number Density 


a 


b 




(sq. deg.) 






Shot noise 


15 


52.85 


-0.91 




45 


26.50 


-1.04 


Shot noise 


15 


41.34 


-0.54 


+cosmic variance 


45 


27.55 


-0.66 



Table 3. Best fit parameters for a and b in equation ( | 1 1 [ > for 
the fractional error on sa for varying S/N (the models used to 
obtain the scaling fit have S/N=2, 3.5, 5, 7.5, 10, 15 and 20). 
Scalings are given for background source densities of both 15 
and 45 quasars per square degree, and a total survey area of 
79 deg 2 . Top row: Fit parameters for Monte-Carlo shot noise 
error bars only. Bottom row: Fit parameters for shot noise and 
cosmic variance errors added in quadrature. 



Error 


S/N 


a 


b 


Shot noise 


5 


99.12 


-0.78 




10 


140.39 


-1.14 


Shot noise 


5 


76.16 


-0.61 


+cosmic variance 


10 


53.35 


-0.55 



Table 4. Best fit parameters for a and b in equation |TTJ for 
the fractional error on sa for varying quasar sightline density 
(the models used to obtain the scaling fit have 5, 10, 15, 22.5, 
30, 45 and 60 deg -2 ). Scalings are given for both S/N = 5 and 
10, and a total survey area of 79 deg 2 . Top row: Fit parameters 
for Monte-Carlo shot noise error bars only. Bottom row: Fit 
parameters for shot noise and cosmic variance errors added in 
quadrature. 



fractional error (per cent) = ax 



b / Survey area\ 1//2 
V 79 deg 2 ) 



(11) 



where x is either S/N (Table [3| or quasar number density 
(Table |4| and perform a least squares fit on our simula- 
tion results. We provide the best fit parameters a and b 
for the fractional error fit for the shot noise errors alone, 
as well as the shot noise added in quadrature with the 
cosmic variance. The fractional error also scales as the 
inverse square root of the survey area. 

7.1 Scaling constraints to a BOSS-like survey 

Using the best fit parameters from our model for the 
BAO scale from Table [2j we can estimate the accuracy 
of forthcoming large volume BAO Lya forest surveys. 
BOSS anticipates the detection of 150,000 quasars in a 
total survey area of 10 4 deg 2 , with S/N = 5 and a source 
number density of 15 deg -2 . Scaling our simulation re- 
sults from Table [2] we anticipate a detection of the BAO 
scale to within ~1.4 per cent including cosmic variance. 



8 CONCLUSION 

A series of recent studies have used large volume, high 
resolution N-body and hydrodynamical simulations to 
study the detection of BAO in forthcoming large volume 
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Lya forest surveys (such as BOSS). One limitation of 
these simulations is the large computational cost required 
for a simulation of sufficient volume. On the other hand, 
in order to study the systematics involved in the detection 
of the BAO signal, it will be critical to run many simu- 
lations to fully probe parameter space. In this work we 
have demonstrated that a semi-analytical model utilising 
a density field calibrated against a hydrodynamical sim- 
ulation can be used to produce very large volume, high 
resolution simulations at a fraction of time and computa- 
tional cost, and at a reasonable level of accuracy. We find 
good quantitative agreement between the semi-analytical 
model and the hydrodynamical simulation for a range of 
observables. In particular, we are able to reproduce one 
and two-point statistics (the flux PDF and PS), which 
are in reasonable agreement with observational data. We 
stress that in this work we assume a constant redshift, 
z — 3, whereas one expects that the accuracy of the re- 
covered BAO scale with be redshift dependant, and hence 
our results will differ slightly for different redshifts. 



We used our model to generate mock Lya forest 
data sets drawn from a 4096 3 1 Gpc 3 simulation volume. 
We demonstrated we are able to recover the BAO signal 



through reconstruction of the 3D Lya PS ( McDonald & 
Eisenstein|2007 1 . We also recover the characteristic BAO 
scale length by applying a \ 2 minimisation with a sim- 



ple two parameter fitting function (Blake & Glazebrook 



2003 1. We used Monte-Carlo realisations of the Lya forest 
data to estimate the 1 — a uncertainties on the PS due to 
shot noise for varying S/N and background source den- 
sities, and include an estimate of cosmic variance error 
from our nine 1 Gpc 3 simulations. Our mock surveys of 
~ 15 quasars per square degree over ~ 79 square degrees 
with S/N = (5, 10, 20, oo) yield relative errors of (12.7, 
8.51, 2.64, 0.63) per cent and for ~ 45 quasars per square 
degree (5.73, 2.41, 1.23, 0.30) per cent on the recovered 
BAO scale. The accuracy to which we can recover the 
BAO scale with square root of the inverse volume, and 
are consistent with the predictions presented b y|McDon- 
ald fc Eisenstein| ( |2007[ ), |Slosar et al.| ( |2009[ ) and |McQuinn 
fc White| 



Using the results of our mock Lya analysis, we an- 
ticipate that for a S/N = 5 with 15 quasars per square 
degree for a BOSS-like survey of 10 4 deg 2 , one should 
expect a fractional error of ~ 1.4 per cent on the BAO 
scale. We also provide simple scaling relations for esti- 
mating the expected fractional error on the BAO scale 
given the number density of quasars and the signal to 



The method presented here enables generation of 
large scale realisations of the IGM density field and Lya 
forest quickly and efficiently. It is therefore ideal for inves- 
tigating the key systematics which will impact on BAO 
detection, such as errors in the continuum shape and the 
effect of non-gravitational fluctuations on the Lya forest, 
including large-scale temperature and ionisation varia- 
tions in the IGM at z = 3. 
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APPENDIX A: GPGPU 

In this appendix we describe the use of GPGPU (Gen- 
eral Purpose computing on Graphics Processing Units) 
programming in our simulations. Our code has been im- 
plemented in three formats, single core, parallelised mul- 
ticore and a mix of parallel multicore and GPU program- 
ming. The larger simulations used in this paper are per- 
formed using both a parallel multicore and a single GPU. 
For all simulations we use an Intel Xeon 2.00Ghz quad 
core CPU and a nVidia FX580 CUDA enabled graphics 
card. 

Over the last few years major steps have been taken 
in the implementation of GPU programming into many 
astrophysical applications. Implementations of current 
astrophysical simulations with GPUs can report upward 
of a 10-100 factor speed up in computational time (see 
Fluke et al.|2010" and references therein) . One of the ma- 
jor problems for GPU programmers is how to take full 
advantage of a GPU for solving computational problems. 
One must maximise usage of on-chip resources, while al- 
lowing as much of the calculation to be run without in- 
tervention from the host CPU. One must also avoid data 
dependency, where a result at one point in the data can 
impact on the outcome of a separate piece of data (i.e. 
data must be as independent as possible). The transfer 
of data from the CPU and GPU can also limit the ef- 
fectiveness of the GPU programming application, and is 
dependent on the details of the individual's computer. 

The goal of our simulations is to provide a model 
that can be used to investigate the Lya forest. To accom- 
plish this we only generate a limited number of sightlines 
per simulation rather than the entire density field (i.e. 
we generate our density field in Fourier space, and only 
Fourier transform the number of sightlines required). Our 
semi-analytical model is perfectly suited for implementa- 
tion on a GPU, allowing us to quickly run mock survey 
simulations in less than a day. 

In Figures |A1| and |A2[ we show the increase in per- 
formance gained by using a GPU for certain functions 
in our simulation, relative to single and parallelised quad 
core implementations. Figures [Al| and [A2| show the total 
runtime of the individual section of the code used to gen- 
erate the Fourier space density field and for calculating 
the spherically averaged 3D PS. We scale up the number 
of pixels along the length of the simulation cube in pow- 
ers of two, from 256 to 4096. We include in the timing 
all required overhead, such as data transfer from CPU to 
GPU and memory allocation. 

For calculation of the Fourier space density field we 
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Figure Al. Computation time for the generation of the den- 
sity field in Fourier space as a function of the pixel number per 
side of a simulation cube. The solid curve shows the timings 
for a single core calculation, the dashed for quad core, and 
the dot-dashed curve is obtained using the GPU. The timing 
takes into account all required overhead such as data transfer 
between devices and memory allocation. 

observe a reduction in runtime by roughly a factor of 4 
on moving from the single to quad core implementation, 
with a further factor of 4 from quad core to GPU. For the 
3D PS, we observe a factor 3 decrease in runtime from 
single to quad core, and a factor of 8 decrease from quad 
to GPU. Although our the decrease in runtime is only a 
factor of ~4-8 (relative to the parallelised CPU) for our 
two chosen processes, this is mainly due to the specific 
GPU used. Our GPU contains only 1.12Ghz clock speed, 
with a maximum of 32 cores, whereas top of the line 
GPGPUs allow up to as many as 440 cores with clock 
speeds of 1.3Ghz. Implementation of our GPU enabled 
code onto one of the newest devices would facilitate the 
10-100 factor increase in computational speed. 

A considerable amount of computational time is 
taken up by Fourier transformation of our simulated data. 
Although not implemented in the current version of our 
code, initial testing of the inbuilt FFT libraries provided 
show an expected additional factor of ~10 reduction in 
computation time, which would further reduce our total 
computation time. 



APPENDIX B: MEMORY LIMITATIONS 
WHEN GENERATING LARGE SIMULATION 
BOXES 

In this work we have used our code to generate 4096 3 sim- 
ulation boxes on a desktop PC. However on a desktop PC 
we are memory limited and cannot store the entire box 
in memory at once. We circumvent this problem by using 
the natural symmetry of the density field in Fourier space 
to break up our large simulation volume into 8 smaller 
simulation volumes. However, even these 8 smaller simu- 
lation volumes cannot be fully read into memory at one 
time, and hence we only read into memory a small section 
of the volume at any one time. The advantage of such an 
approach is that our code can be performed easily on any 
desktop computer. 



Figure A2. Computation time for the generation of the 3D 
PS as a function of the pixel number per side of a simulation 
cube. The curves are as described for Figure |A"T| 

Instead of computing the 3D FFTs (Fast Fourier 
Transform) which would require the full simulation to 
be read into memory, we again use the symmetry of the 
density field and Fourier transform in 2D slices across 
our large simulation volume (reading in the necessary 4 
smaller simulation volumes for each 2D slice). We then 
only Fourier transform out the final ID lines of sight that 
we have randomly generated throughout the full simula- 
tion volume. This final step of only Fourier transforming 
the final ID line of sight is well suited for generating mock 
Lya forest surveys. 
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